我们提出Dave Aquatic Virtual Environals(Dave),这是用于水下机器人,传感器和环境的开源仿真堆栈。传统的机器人模拟器并非旨在应对海洋环境带来的独特挑战,包括但不限于在空间和时间上变化的环境条件,受损或具有挑战性的感知以及在通常未探索的环境中数据的不可用。考虑到各种传感器和平台,对于不可避免地抵制更广泛采用的特定用例,车轮通常会重新发明。在现有模拟器的基础上,我们提供了一个框架,以帮助加快算法的开发和评估,否则这些算法需要在海上需要昂贵且耗时的操作。该框架包括基本的构建块(例如,新车,水跟踪多普勒速度记录仪,基于物理的多微型声纳)以及开发工具(例如,动态测深的产卵,洋流),使用户可以专注于方法论,而不是方法。比软件基础架构。我们通过示例场景,测深数据导入,数据检查的用户界面和操纵运动计划以及可视化来演示用法。
translated by 谷歌翻译
Pre-trained large language models can efficiently interpolate human-written prompts in a natural way. Multitask prompted learning can help generalization through a diverse set of tasks at once, thus enhancing the potential for more effective downstream fine-tuning. To perform efficient multitask-inference in the same batch, parameter-efficient fine-tuning methods such as prompt tuning have been proposed. However, the existing prompt tuning methods may lack generalization. We propose SPT, a semi-parametric prompt tuning method for multitask prompted learning. The novel component of SPT is a memory bank from where memory prompts are retrieved based on discrete prompts. Extensive experiments, such as (i) fine-tuning a full language model with SPT on 31 different tasks from 8 different domains and evaluating zero-shot generalization on 9 heldout datasets under 5 NLP task categories and (ii) pretraining SPT on the GLUE datasets and evaluating fine-tuning on the SuperGLUE datasets, demonstrate effectiveness of SPT.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Persuasion modeling is a key building block for conversational agents. Existing works in this direction are limited to analyzing textual dialogue corpus. We argue that visual signals also play an important role in understanding human persuasive behaviors. In this paper, we introduce the first multimodal dataset for modeling persuasion behaviors. Our dataset includes 199 dialogue transcriptions and videos captured in a multi-player social deduction game setting, 26,647 utterance level annotations of persuasion strategy, and game level annotations of deduction game outcomes. We provide extensive experiments to show how dialogue context and visual signals benefit persuasion strategy prediction. We also explore the generalization ability of language models for persuasion modeling and the role of persuasion strategies in predicting social deduction game outcomes. Our dataset, code, and models can be found at https://persuasion-deductiongame.socialai-data.org.
translated by 谷歌翻译
Conventional cameras capture image irradiance on a sensor and convert it to RGB images using an image signal processor (ISP). The images can then be used for photography or visual computing tasks in a variety of applications, such as public safety surveillance and autonomous driving. One can argue that since RAW images contain all the captured information, the conversion of RAW to RGB using an ISP is not necessary for visual computing. In this paper, we propose a novel $\rho$-Vision framework to perform high-level semantic understanding and low-level compression using RAW images without the ISP subsystem used for decades. Considering the scarcity of available RAW image datasets, we first develop an unpaired CycleR2R network based on unsupervised CycleGAN to train modular unrolled ISP and inverse ISP (invISP) models using unpaired RAW and RGB images. We can then flexibly generate simulated RAW images (simRAW) using any existing RGB image dataset and finetune different models originally trained for the RGB domain to process real-world camera RAW images. We demonstrate object detection and image compression capabilities in RAW-domain using RAW-domain YOLOv3 and RAW image compressor (RIC) on snapshots from various cameras. Quantitative results reveal that RAW-domain task inference provides better detection accuracy and compression compared to RGB-domain processing. Furthermore, the proposed \r{ho}-Vision generalizes across various camera sensors and different task-specific models. Additional advantages of the proposed $\rho$-Vision that eliminates the ISP are the potential reductions in computations and processing times.
translated by 谷歌翻译
Proximal Policy Optimization (PPO) is a highly popular policy-based deep reinforcement learning (DRL) approach. However, we observe that the homogeneous exploration process in PPO could cause an unexpected stability issue in the training phase. To address this issue, we propose PPO-UE, a PPO variant equipped with self-adaptive uncertainty-aware explorations (UEs) based on a ratio uncertainty level. The proposed PPO-UE is designed to improve convergence speed and performance with an optimized ratio uncertainty level. Through extensive sensitivity analysis by varying the ratio uncertainty level, our proposed PPO-UE considerably outperforms the baseline PPO in Roboschool continuous control tasks.
translated by 谷歌翻译
Over the years, Machine Learning models have been successfully employed on neuroimaging data for accurately predicting brain age. Deviations from the healthy brain aging pattern are associated to the accelerated brain aging and brain abnormalities. Hence, efficient and accurate diagnosis techniques are required for eliciting accurate brain age estimations. Several contributions have been reported in the past for this purpose, resorting to different data-driven modeling methods. Recently, deep neural networks (also referred to as deep learning) have become prevalent in manifold neuroimaging studies, including brain age estimation. In this review, we offer a comprehensive analysis of the literature related to the adoption of deep learning for brain age estimation with neuroimaging data. We detail and analyze different deep learning architectures used for this application, pausing at research works published to date quantitatively exploring their application. We also examine different brain age estimation frameworks, comparatively exposing their advantages and weaknesses. Finally, the review concludes with an outlook towards future directions that should be followed by prospective studies. The ultimate goal of this paper is to establish a common and informed reference for newcomers and experienced researchers willing to approach brain age estimation by using deep learning models
translated by 谷歌翻译
State-of-the-art activity recognizers are effective during the day, but not trustworthy in the dark. The main causes are the distribution shift from the lower color contrast as well as the limited availability of labeled dark videos. Our goal is to recognize activities in the dark as well as in the day. To compensate for the lack of labeled dark videos, we introduce a pseudo-supervised learning scheme, which utilizes task-irrelevant unlabeled dark videos to train an activity recognizer. Our proposed activity recognizer makes use of audio which is invariant to illumination. However, the usefulness of audio and visual features differs according to the illumination. Thus we propose to make our audio-visual recognizer `darkness-aware'. Experiments on EPIC-Kitchens, Kinetics-Sound, and Charades demonstrate that our proposals enable effective activity recognition in the dark and can even improve robustness to occlusions.
translated by 谷歌翻译
Growing materials data and data-driven informatics drastically promote the discovery and design of materials. While there are significant advancements in data-driven models, the quality of data resources is less studied despite its huge impact on model performance. In this work, we focus on data bias arising from uneven coverage of materials families in existing knowledge. Observing different diversities among crystal systems in common materials databases, we propose an information entropy-based metric for measuring this bias. To mitigate the bias, we develop an entropy-targeted active learning (ET-AL) framework, which guides the acquisition of new data to improve the diversity of underrepresented crystal systems. We demonstrate the capability of ET-AL for bias mitigation and the resulting improvement in downstream machine learning models. This approach is broadly applicable to data-driven materials discovery, including autonomous data acquisition and dataset trimming to reduce bias, as well as data-driven informatics in other scientific domains.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译